This file describes the replication materials for Bisbee and Larson, “Testing Social Science Network Theories with Online Network Data”. 

The entirety of the published results are replicable via the zipped folder that contains this README.txt file. 

For users interested in simply running a single Stata .do file, the 'master_replication.do' file will execute all relevant Stata commands and R scripts in the appropriate sequence, yielding all figures and tables presented in the paper. 

For users interested in exploring specific results in detail, the compressed folder also includes all necessary .csv, .dta, and .RData materials necessary to reproduce any given figure or table in isolation. Also included in the compressed folder is an organizational chart that details which figures and tables are produced by which R scripts and the precursor datasets required. For clarification on any part, please contact james.bisbee@nyu.edu.

In the following sections, we describe each script in the sequence necessary to go from the raw survey results to the final results. 

NOTE 1: For users with Stata 14 or above, please make sure to uncomment the //version(12) command wherever you find it in order to open .dta files in R. (Disregard if you live in the future and R's foreign package has learned to play nicely with .dta files of all generations.)

NOTE 2: Please pay attention to the directory references throughout, as well as the R versions referenced in the master_replication.do file, and adjust accordingly to reflect your folder structure / R version. 

NOTE 3: There are parts of the analysis that rely on randomness, either to produce jittered points on a plot or to randomly select divisions for cross-validation against which to compare online/offline splits. In these cases, replicated values may differ slightly from those reported in the paper. The authors are confident that such deviations will be small and the main conclusions from the paper will obtain. 

1) data_prep.do: This file takes the raw survey results from Qualtrics and cleans the data by removing respondents who failed the attention tests and those who completed the survey faster than theoretically possible to pay the necessary attention, coarsening rich subject demographic information into dummy variables, extracting useful tie measures from the dinner table seating question based on arrangement of the subject relative to the five elicited ties, creating an index of introversion based on 9 questions, creating the inverse propensity score weights based on Chaid random forests, and then saving the final data set in both long and wide forms. Please allow several minutes for the program to run, depending on computing power.

2) descriptive_replication.do: This file takes the final_data_clean.dta file produced above and creates summary statistics (presented in Table 2 in the paper) as well as the correlational database between the different measures of tie strength that is used to generate Figure 3 via the descriptive_results.R script.

3) characteristics_replication.do: This file does two things. First, it streamlines the cleaned data for analysis in the chars_results.R script. Second, it creates measures of the difference in tie strength by dimension between each subject's weakest and strongest ties which is used by the chars_results.R script to create Figure 5. 

4) replationships_replication.do: This file regresses donation amounts between each subject and each elicited tie type on different measures of the tie strength between the subject that that tie, using CEM weights to improve precision. These results are saved for use by the rels_results.R script to create Figure 8.

5) protest_replication.do: This file prepares the protest outcome for causal analysis in R via the protest_results.R script. In addition, this file estimates non-causal correlations between dimensions of tie strength and protest behaviors based on open-ended survey questions. The latter .tex files constitute Tables 2 and 3 in the Supporting Information. 

6) descriptive_results.R: This R script generates the summary statistics presented in Table 2 from the summary_stats.dta file created by the descriptive_replication.do file described above in (2). In addition, it uses the correlation-long.dta file created by the descriptive_replication.do file described above in (2) to generate Figure 3. Finally, it uses the validation results ("cleaned-full.csv") to generate Figure 1 and SI Figure 2. 

7) chars_results.R: This R script creates the main table (Table 3) used to test differences in measures of tie strength in the paper. In addition, the R script uses free-step down resampling to account for multiple comparisons and generate Figure 4 as well as Figure 1 in the Supporting Information. Finally, the R script uses the dinner table seating question to explore structural differences in Figure 6 and 7. Please allow several minutes to run, depending on computing power. Note that for the chars_adjpvals.pdf figure (Figure 4 in the paper), the placement of the jittered points may differ on replications due to differences in random draws.

8) rels_results.R: This R script summarizes the difference in relationships between donation behavior and each measure of tie strength in both the online and offline conditions and then tests whether these estimates are significantly different (Figure 8). In addition, this script uses structural equation modeling (SEM) to arrive at a more robust estimate of the relationship between dimensions of tie strength and donation decisions using the full data. These results are presented in Figure 4 in the Supporting Information.

9) models_results.R: This R script divides the data in half a random and estimates models fit by random divisions against those fit on the online data and used to predict donation behavior in the offline data. The R script then uses the output of these simulations to generate Figures 9 and 10 in the paper. Note that the randomness in the cross validation splits may produce slightly different results than those reported in the paper, due only to the randomness of the sub-samples. 

10) protest_results.R: This R script takes the simplified protest data and estimates causal estimates via both bivariate regressions and structural equation modeling (SEM). The output is presented in Figure 5 in the supporting information. 